Clustering Metagenome Short Reads Using Weighted Proteins

نویسندگان

Gianluigi Folino

Fabio Gori

Mike S. M. Jetten

Elena Marchiori

چکیده

This paper proposes a new knowledge-based method for clustering metagenome short reads. The method incorporates biological knowledge in the clustering process, by means of a list of proteins associated to each read. These proteins are chosen from a reference proteome database according to their similarity with the given read, as evaluated by BLAST. We introduce a scoring function for weighting the resulting proteins and use them for clustering reads. The resulting clustering algorithm performs automatic selection of the number of clusters, and generates possibly overlapping clusters of reads. Experiments on real-life benchmark datasets show the effectiveness of the method for reducing the size of a metagenome dataset while maintaining a high accuracy of organism content.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotation of metagenome short reads using proxygenes

MOTIVATION A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in...

متن کامل

Unsupervised Two-Way Clustering of Metagenomic Sequences

A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes mul...

متن کامل

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks

MOTIVATION Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, cal...

متن کامل

Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads

BACKGROUND Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of inte...

متن کامل

WEVOTE: Weighted Voting Taxonomic Identification Method of Microbial Sequences

BACKGROUND Metagenome shotgun sequencing presents opportunities to identify organisms that may prevent or promote disease. The analysis of sample diversity is achieved by taxonomic identification of metagenomic reads followed by generating an abundance profile. Numerous tools have been developed based on different design principles. Tools achieving high precision can lack sensitivity in some ap...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Clustering Metagenome Short Reads Using Weighted Proteins

نویسندگان

چکیده

منابع مشابه

Annotation of metagenome short reads using proxygenes

Unsupervised Two-Way Clustering of Metagenomic Sequences

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks

Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads

WEVOTE: Weighted Voting Taxonomic Identification Method of Microbial Sequences

عنوان ژورنال:

اشتراک گذاری